Cross-Lingual Word Sense Disambiguation for Languages with Scarce Resources

نویسندگان

Bahareh Sarrafzadeh

Nikolay Yakovets

Nick Cercone

Aijun An

چکیده

Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged corpus to aid machine learning approaches for Persian texts, nor any suitable parallel corpora. Yet due to the ever-increasing development of Persian pages in Wikipedia, this resource can act as a comparable corpus for English-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation benchmark and compare it with the state-of-the-art u...

متن کامل

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...

متن کامل

Let Sense Bags Do Talking: Cross Lingual Word Semantic Similarity for English and Hindi

Cross Lingual Word Semantic (CLWS) similarity is defined as a task to find the semantic similarity between two words across languages. Semantic similarity has been very popular in computing the similarity between two words in same language. CLWS similarity will prove to be very effective in the area of Cross Lingual Information Retrieval, Machine Translation, Cross Lingual Word Sense Disambigua...

متن کامل

LIMSI : Cross-lingual Word Sense Disambiguation using Translation Sense Clustering

We describe the LIMSI system for the SemEval-2013 Cross-lingual Word Sense Disambiguation (CLWSD) task. Word senses are represented by means of translation clusters in different languages built by a cross-lingual Word Sense Induction (WSI) method. Our CLWSD classifier exploits the WSI output for selecting appropriate translations for target words in context. We present the design of the system ...

متن کامل